Обновить

Building your own communication network over I2P

Время на прочтение 6 min
Количество просмотров 15K
With modern trends aimed at total listening and collecting all kinds of information, the use of secure means of communication is more relevant than ever. Encrypting the transmitted data itself only partially solves the problem, since the very fact of information exchange between participants is more important than its content.

In most modern systems, be it email, ICQ or Twitter, the owner of the servers has all this data and can, if necessary, share it when receiving a formal or informal request for it. Below is a project of a network built on top of I2P, in which the owner uses his nodes only to ensure more stable operation and as gateways to the regular Internet, having no more information than ordinary I2P nodes.


Let's consider the mechanisms for ensuring anonymity and confidentiality of I2P, which we will rely on to build our network:
  1. Each I2P participant is a router known to the rest of the network and one or more addresses that form the actual “invisible” network. The meaning of I2P is the practical impossibility of finding out on which router a particular address is located
  2. An I2P address is a public key pair for asymmetric encryption and signing. The private key pair is stored by the owner and is proof of the authenticity of the address. In other words, for authorization, instead of passwords, this file with keys is used - an analogue of an electronic digital signature, which can, if necessary, be implemented in the form of a token
  3. Connections between routers are encrypted using AES, the session key for which is negotiated in several steps, including verification of the host address signature to counter man-in-the-middle attacks.»


Previously it was shown that I2P is actually a two-layer design: a router that provides communication with other routers and tunnels, and protocols designed to transfer data between applications. If router protocols seem carefully thought out and effective, then application protocols leave much to be desired and are a jumble of different concepts and ideas, driven by the desire to make them as universal as possible and “transparent” for existing applications. In our case, the task is significantly simplified, since we assume exchange between our clients, so we can use our own protocol.

Another problem with I2P is that when trying to access an address, an “address not found” error occurs, although the resource with the specified address is currently online. This happens due to the incompleteness of the network database, for example, immediately after the start, when information about many routers becomes outdated and time is required to update it. And since addresses publish their LeaseSets on the floodfills “closest” to them, the client may simply not have the necessary floodfills in the database yet. Our clients will use a second network database containing a set of nodes corresponding to our servers and publish their LeaseSets only on these nodes, allowing them to find each other's LeaseSets immediately.

Each I2P node is identified by an I2P address, which is 2 pairs of public and private keys, randomly generated at the time the node is created, without any correlation with the IP address or location. There is no central source of addresses; it is assumed that the probability of two randomly generated addresses matching is negligible. The owner of the node is the one who has the file with the complete set of keys. Two public keys and a 3-byte certificate (currently always null) form a 387-byte node ID, by which the node becomes known in I2P. Since the full 387-byte identifier is quite inefficient for comparing, sorting and transferring data, the 32-byte SHA-256 hash of the identifier is used to identify the node, which we use to identify the client. Since the address contains the signature key, it will be difficult for an attacker to impersonate another client; this is equivalent to selecting a pair of keys whose hash will correspond to the given identifier. If necessary, the client can confirm that it is he who is hiding behind the I2P address by signing a document with his key.

So, our network will consist of clients of our network running on computers and servers belonging to us. Both clients and servers are full-fledged I2P routers, while the servers are declared high-speed and are designed primarily to pass transit traffic, while clients mainly use their own tunnels, and transit traffic to mask their activity. Information about servers is public and known to clients, but servers do not know anything about clients and have no way to distinguish clients from regular I2P nodes. Clients will select nodes for tunnels so that there is exactly one server in the tunnel, and the remaining nodes belong to other participants in regular I2P. Even if all our servers are under the control of an attacker, one node will not be enough to determine the other end of the standard 3-step tunnel for I2P. The user will always have the opportunity to see tunnel routes, as well as exclude suspicious nodes.

On the other hand, one of our servers in the tunnel is necessary to increase the reliability of the tunnels through the early detection of tunnels that have stopped working. This is one of the fundamental problems of I2P: if a node agreed to participate in a transit tunnel, and then stopped working (for example, the user stopped it), then the tunnel creator knows nothing about it and continues to use the broken tunnel for a long time. Unlike regular I2P, our clients will actively send test messages into the tunnel, and as soon as our server detects a lack of traffic in the tunnel, it will publish a notification to clients about this, thereby allowing the client to stop using such a tunnel immediately. 

To exchange data between our clients, I2NP message type 20 - Data, containing arbitrary data, or message type 11 - Garlic can be used. Initially, I2P assumed the following exchange scheme between addresses: it was necessary to request the LeaseSet of the recipient, then a Garlic type message should be generated, indicating the address as the destination, encrypt it with the public encryption key from the LeaseSet and send it to the appropriate tunnel. The router, upon receiving such a message, decrypted it and further determined who the message was intended for. But in this case, the encryption key had to be the same for all addresses sitting on a given router, which created a large “hole” in security, therefore, in the modern implementation of I2P, each address has its own set of incoming tunnels and an encryption key, accordingly the router can determine the address and without the "garlic" message. By not using garlic encryption, we get rid of yet another cumbersome I2P design - the AES/ElGamal engine, and can use encryption that is more efficient for our purposes, while at the same time sending type 11 messages to make our traffic indistinguishable from regular I2P.


Clients can exchange mail both among themselves within the network and with external recipients. In the first case, I2P addresses are used directly, and messages are sent through tunnels from the LeaseSet of the recipient. If the client cannot detect a LeaseSet with such an address, it will continue to do so for a certain time, after which it will generate an undeliverable message..

In the second case, the client should use one of our servers as an outgoing SMTP server. Each of our servers will have its own address, and the client's address will correspond to the username assigned by the server, together forming a valid mailing address. If a client wants to send a mail message outside the network, he must find the server's LeaseSet (and it will definitely find it), after which the server will recognize the message as mail and send it to the recipient as a regular SMTP server. The recipient will only know the addresses of our SMTP server, and even if someone wants to find out from us who is hiding behind this or that address, the most we can tell is the I2P address, and we still don’t know whose address it is. If the server receives a message from the outside, it uses the user’s name to find its I2P address and then sends it in the usual way within our network.

In order to combat spam, we will introduce restrictions on the number of messages sent from each I2P address. In order for an address to send messages outside, it will have to register on the server and find out its name, and we will require a certificate from it, resulting from some resource-intensive computing task, thereby complicating the mass generation of addresses, while at the same time not creating problems for those who you only need one or more addresses.

Thus, we get a network that, on the one hand, ensures anonymity and confidentiality of transmitted information, the disclosure of which is impossible without access to the client’s computer, and on the other hand, maintains a high level of trust between clients using cryptographic identification tools. Using your own protocol and only it between clients can significantly simplify the implementation and increase the reliability of the network, while the emergence of new high-speed routers will improve the operation and throughput of I2P itself.

I would like to hear the opinion of the respected habr community about the proposed project as a whole, and first of all about potential attacks with the aim of de-anonymizing clients, as well as other weaknesses and vulnerabilities.
Tags:
Hubs:
Всего голосов 16: ↑11 и ↓5 +6
Комментарии 12
+12

Comments 12

The project is interesting, but somehow the technical part is missing.
The technical part is primarily the implementation and modification of I2P. I didn’t want to go deeper into this so as not to overload the article with unnecessary details - even without them, in my opinion, it turned out to be somewhat cumbersome.
In my opinion, the title does not correspond to the content: when reading “Building your own communication network,” I expect to see something inside about the network, i.e. some basic thing that the rest of the software can then use, i.e. in most cases this is an IP or Ethernet network, perhaps some other but also basic one - which can then be used by some program that knows nothing about I2P. This turns out to be a very specialized thing only for a program that will be written specifically for this protocol.

In fact:
As I understand it, the main reason for building a network is to increase stability through early detection of failures of intermediate nodes. Why is this system better than sending a alive message for TCP connections through a socks proxy for I2P or some kind of pings at the level of your own application over a regular I2P network, without using special servers?

Based on my experience using I2P, I have an opinion about such inconveniences:
1. High latency
2. Limited use of existing software - as I remember there are either point-to-point connections via a socks proxy or http via an http proxy, or something special.

In part, you can motivate by pings and rebuilding tunnels, although I have not encountered this in practice - I calmly wandered through internal sites without any obviously large delays.

What do you want:
It would be nice to solve this problem - i.e. building a general purpose network: your own VPN (IP/Ethernet) or some wider IP network. So that the participants of this network could simply connect an additional network adapter to the computer and, for example, Jabber servers could communicate with each other using the usual protocol, Seafile storage, it would be possible to ping a node using standard means to understand whether it is alive or, for example, access the Internet through which - a gateway without turning traffic to an http proxy, but simply by simply setting up routing or connecting to some VPN server, again using conventional means.

In this case, a client-side implementation would ensure the circulation of P2P traffic without the participation of central servers, and the central servers would distribute IP addresses and indicate correspondence. Conventionally, now IP 1.2.3.4 corresponds to the I2P address abcd and then traffic exchange between them occurs directly.
>Conventionally, now IP 1.2.3.4 corresponds to the I2P address abcd and then traffic exchange between them occurs directly.

And goodbye to all anonymity.
Perhaps I said it unclearly - IP addresses are private for this network.

Those. By using the internal IP address you can find out the I2P address that corresponds to it, you won’t be able to find out anything else from it.
Tun interface instead of using separate ports as is currently done in I2P.
This problem is being successfully solved by the way; this project is focused primarily on the application of I2P.
>Conventionally, now IP 1.2.3.4 corresponds to the I2P address abcd and then traffic exchange between them occurs directly
And it’s done quite simply.
1. When setting up the router, you specify a block of gray addresses to which i2p addresses will be mapped. One of these addresses is the router address.
2. The router raises the interface with the address from this block.
3. The router raises a DNS server, which resolves *.i2p to addresses from the gray block. With some small TTL.

After this, all the software starts working, both server and client, which generally does not give a damn about any proxies and soks.

All you have to do is sit down and write. :)
>As I understand it, the main reason for building a network is to increase stability through early detection of failures of intermediate nodes.

A more important reason is the deliberate completeness of the database on which our clients declare their LeaseSets, thereby guaranteeing that if a LeaseSet is published, it will always be found by another client.

>Why is this system better than sending a alive message for TCP connections through a socks proxy for I2P or some kind of pings at the level of your own application over a regular I2P network, without using special servers?

Tunnels only work in one direction, so you have to ping a couple of tunnels, and if the ping does not return, then it’s unclear which of the two is dead.
In addition, the introduction of one known reliable node reduces the probability of tunnel failure by a third.
A more important reason is the deliberate completeness of the database on which our clients declare their LeaseSets, thereby guaranteeing that if a LeaseSet is published, it will always be found by another client.

If I understand correctly LeaseSet is conditionally information through which routers you can connect to a particular I2P address.
Those. conditionally - you offer a centralized route database to quickly find the connection path?

It seems to me that this is worth checking separately to reduce vulnerabilities - it’s not for nothing that a path of several hops was made with the deliberate concealment of the route and the impossibility of determining the final recipient (i.e. who is this node - the recipient or also an intermediate node).
If there is a central database of addresses, a correspondence may appear - how to find the end node and perhaps knowing this you can understand what IP address it works on.
These are general considerations - I tried I2P a relatively long time ago and not for long, because... normal software does not work, and there is no one to communicate with specialized software. Explain where I'm wrong.

Tunnels only work in one direction, so you have to ping a couple of tunnels, and if the ping does not return, then it’s unclear which of the two is dead.

Well, it’s normal if any of the tunnels falls - rebuild both. This doesn’t happen every minute so that it requires some big optimizations to reduce traffic overhead. And in terms of time, they can be rebuilt in parallel and it will take as much time as rebuilding one tunnel.

In addition, the introduction of one known reliable node reduces the probability of tunnel failure by a third.

Well, the degree of anonymity is reduced just as much - this effect (even better) can be achieved if you set the tunnel settings to 1 less hop.
As for LeaseSet, it’s generally correct - this is the current set of incoming tunnels.
The routes can be anything, here the idea is different.
Let's say you want to publish a LeaseSet of your address, you should do this on the floodfill “closest” to you (in the sense of DHT). Done.
Now your friend wants to contact you and he will also look for your LeaseSet at the floodfill closest to you. But it may turn out that these floodfills are different for you and for him because at this moment you and he have different netDb contents.
We are going to publish on our servers.

>Well, it’s normal if any of the tunnels falls - rebuild both. This doesn’t happen every minute so that it requires some big optimizations to reduce traffic overhead. And in terms of time, they can be rebuilt in parallel and it will take as much time as rebuilding one tunnel.

This is exactly how it is done now. But building a tunnel is an expensive proposition. In addition, many routers have a limit on the number of transit tunnels. As a result, it turns out that there are a lot of unused tunnels hanging, because the lifetime of the tunnel is strictly 10 minutes.

>Well, the degree of anonymity is reduced just as much - this effect (even better) can be achieved if you set the tunnel settings to 1 less hop.

So our server can be located in an arbitrary place in the tunnel, so this is not equivalent to a decrease in one hop.
By the way, in some advanced cases (when the client is sitting on a Nat and almost everything is closed), we will have to extend the tunnels by one hop, making our server the first.
It would be better if you wrote that your implementation of I2P in C++ already works =) so maybe the developers would catch up
i2pd can be said to be a basic project that implements the I2P functionality itself, and this project is a meaningful add-on
Only full-fledged users can leave comments. Sign in, Please.